Performance Evaluation of CSMT for VLIW Processors

نویسندگان

Manoj Gupta

Josep Llosa

Fermín Sánchez

چکیده

Clustered VLIW embedded processors have become widespread due to benefits of simple hardware and low power. However, while some applications exhibit large amounts of instruction level parallelism (ILP) and benefit from very wide machines, others have little ILP, which wastes precious resources in wide processors. Simultaneous MultiThreading (SMT) is a well known technique that improves resource utilization by exploiting thread level parallelism at the instruction grain level. However, implementing SMT for VLIWs requires complex structures. CSMT (Clusterlevel Simultaneous MultiThreading) allows some degree of SMT in clustered VLIW processors. CSMT considers the set of operations that execute simultaneously in a given cluster (named bundle) as the assignment unit. All bundles belonging to a VLIW instruction from a given thread are issued simultaneously. To minimize cluster conflicts between threads, a very simple hardwarebased cluster renaming mechanism is proposed. The experimental results show that CSMT significantly improves ILP when compared with other multithreading approaches suited for VLIW. For instance, with 4 threads CSMT shows an average speedup of 113% over a single-thread VLIW architecture and 36% over Interleaved MultiThreading (IMT). In some cases, speedup can be as high as 228% over single thread architecture and 97% over IMT. Also CSMT for a 2-thread processor, achieves almost the same performance as IMT for a 4-thread processor and also outperforms it in some cases.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

The performance of statically scheduled VLIW processors is highly sensitive to the instruction scheduling performed by the compiler. In this work we identify a major deficiency in existing instruction scheduling for VLIW processors. Unlike most dynamically scheduled processors, a VLIW processor with no load-use hardware interlocks will completely stall upon a cache-miss of any of the operations...

متن کامل

Evaluating Compiler Support for Complexity Effective Network Processing

Statically scheduled processors are known to enable low complexity hardware implementations that lead to reduced design and verification time. However, statically scheduled processors are critically dependent on the compiler to exploit instruction level parallelism and deliver higher performance. In order to ascertain the suitability of statically scheduled processors for network processing (wh...

متن کامل

Modeling and Performance Evaluation of Multi-Processors Organization with Shared Memories

This paper is primarily concerned with theoretical evaluation of the performance of multiprocessors system. A markovian waiting line model has been developed for various different multi-processors configurations, with shared memory. The system is analysed at the request level rather than job level.

متن کامل

Practical Precise Evaluation of Cache Effects on Low Level Embedded Vliw Computing

The introduction of caches inside high performance processors provides technical ways to reduce the memory gap by tolerating longmemory access delays. While such intermediate fast caches accelerate program execution in general, they have a negative impact on the predictability of program performances. This lack of performance stability is a non-desirable characteristic for embedded computing. W...

متن کامل

Implementing Click IP Router Kernel on VLIW Architectures

In this work, we implemented the Click IP Router Kernel in C language provided by Scott Webber et al. for two VLIW processors designed for DSP purpose, namely the Philips Trimedia TM1300 processor and Texas Instrument TMS320C6701 processor. The performance of these processors are compared with those of three other processors, ARM SA-110, HPL-PD EPIC, and Intel IXP1200 [1]. Ways of further perfo...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2007

Performance Evaluation of CSMT for VLIW Processors

نویسندگان

چکیده

منابع مشابه

Aligned Scheduling: Cache-Efficient Instruction Scheduling for VLIW Processors

Evaluating Compiler Support for Complexity Effective Network Processing

Modeling and Performance Evaluation of Multi-Processors Organization with Shared Memories

Practical Precise Evaluation of Cache Effects on Low Level Embedded Vliw Computing

Implementing Click IP Router Kernel on VLIW Architectures

عنوان ژورنال:

اشتراک گذاری